In [81]:
from __future__ import division
import numpy as np
from scipy.stats import norm, t
from IPython.display import display, Math, Latex
from pandas import DataFrame

ltx = lambda s: display(Latex(s))

Example 2

Apple claims that their new Macbook has a better battery life than the current Macbook Air. The current Macbook Air is known to have a battery life of 12 hours. A random testing of 10 new Macbooks find the following battery life data.

13, 14, 10, 11, 12, 13, 11, 14, 15, 9

Perform a hypothesis at the 1% level to determine the validity of Apple’s claims if: We know from past experience that the standard deviation of the battery life is 1.5 hours. We have no other information about the standard deviation of the battery life.


In [82]:
data = np.array([13, 14, 10, 11, 12, 13, 11, 14, 15, 9])
sum_x = np.sum(data)
sum_x2 = np.sum(data ** 2)
avg = np.average(data)
ltx('$\\bar{X}$ = %.2f' % avg)


$\bar{X}$ = 12.20

In [83]:
#i.
z = (avg - 12) / (1.5 / np.sqrt(10))
z_crit = norm.ppf(0.95)
ltx("$z = %.3f < z_{crit} = %.3f$" % (z, z_crit))
ltx("Do not reject $H_0$.")


$z = 0.422 < z_{crit} = 1.645$
Do not reject $H_0$.

In [84]:
#ii.

ssx = sum_x2 - sum_x ** 2 / 10
ltx("SSx = %.2f" % ssx)
s = np.sqrt(ssx / 9)
ltx("$s_x = %.3f$" % s)
ltx("$s_\\bar{x} = %.3f$" % (s / np.sqrt(10)))


SSx = 33.60
$s_x = 1.932$
$s_\bar{x} = 0.611$

In [85]:
t_val = (avg - 12) / (s / np.sqrt(10))
t_crit = t.ppf(0.95, df=9)
ltx("$t = %.3f < t_{crit} = %.3f$" % (t_val, t_crit))
ltx("Do not reject $H_0$.")


$t = 0.327 < t_{crit} = 1.833$
Do not reject $H_0$.

In [86]:
#iii.

t_crit = t.ppf(0.95, df=9)
ltx("$%.2f < \\mu < %.2f$" % (avg - t_crit * s / np.sqrt(10), avg + t_crit * s / np.sqrt(10)))


$11.08 < \mu < 13.32$

Example 3

An article in Nature described an experiment to determine the effect of eating chocolate on a measure of cardiovascular health. 12 subjects consumed 100 grams of dark chocolate and 100 grams of milk chocolate, one type of chocolate per day, and after one hour, the total antioxidant capacity of their blood plasma was measures in an assay. The results are in the table. Is there evidence to support the claim that consuming dark chocolate produces a higher mean level of total blood plasma antioxidant capacity than consuming milk chocolate?


In [115]:
dark = [118.8, 122.6, 115.6, 113.6, 119.5, 115.9]
milk = [102.1, 105.8, 99.6, 102.7, 98.8, 100.9]

i. Let $X$ and $Y$ be the antioxidant capacities after dark and milk choclate respectively. We define $D = X - Y$. The hypotheses are then

$H_0$: $\mu_D \leq 0$

$H_1$: $\mu_D > 0$


In [116]:
d = DataFrame({"X": dark, "Y": milk})
d["D"] = d["X"] - d["Y"]
d["D2"] = d["D"] ** 2
print d


       X      Y     D      D2
0  118.8  102.1  16.7  278.89
1  122.6  105.8  16.8  282.24
2  115.6   99.6  16.0  256.00
3  113.6  102.7  10.9  118.81
4  119.5   98.8  20.7  428.49
5  115.9  100.9  15.0  225.00

In [117]:
sums = d.sum()
n = len(d["D"])
print sums


X      706.00
Y      609.90
D       96.10
D2    1589.43
dtype: float64

In [118]:
dbar = sums["D"] / n
ltx("$\\bar{D} = %.2f$" % dbar)
ssd = sums["D2"] - sums["D"] ** 2 / n
ltx("$SS_D = %.2f$" % ssd)
sd = ssd / (n-1)
ltx("$s_D = %.2f$" % sd)
s = sd / np.sqrt(n)
ltx("$s_\\bar{D} = %.2f$" % s)


$\bar{D} = 16.02$
$SS_D = 50.23$
$s_D = 10.05$
$s_\bar{D} = 4.10$

In [119]:
t_val = dbar / s
t_crit = t.ppf(0.95, df=5)
ltx("$t = %.3f > t_{crit} = %.3f$" % (t_val, t_crit))
ltx("Reject $H_0$.")


$t = 3.905 > t_{crit} = 2.015$
Reject $H_0$.

ii. Now, we formulate the following hypotheses

$H_0$: $\mu_X - \mu_Y = 0$

$H_1$: $\mu_X - \mu_Y \ne 0$


In [120]:
d["X2"] = d["X"] ** 2
d["Y2"] = d["Y"] ** 2
sums = d.sum()

xbar = sums["X"] / n
ltx("$\\bar{X} = %.2f$" % xbar)
ybar = sums["Y"] / n
ltx("$\\bar{Y} = %.2f$" % ybar)
diff = xbar - ybar
ltx("$\\bar{X} - \\bar{Y} = %.2f$" % diff)

ssx = sums["X2"] - sums["X"] ** 2 / n
ssy = sums["Y2"] - sums["Y"] ** 2 / n
ltx("$SS_X = %.2f$" % ssx)
ltx("$SS_Y = %.2f$" % ssy)
sp2 = (ssx + ssy) / (n + n - 2)
ltx("$s_p^2 = %.2f$" % sp2)


$\bar{X} = 117.67$
$\bar{Y} = 101.65$
$\bar{X} - \bar{Y} = 16.02$
$SS_X = 52.91$
$SS_Y = 31.41$
$s_p^2 = 8.43$

In [121]:
t_val = diff / np.sqrt(2 * sp2 / n)
t_crit = t.ppf(0.975, df=(n + n - 2))
ltx("$t = %.3f > t_{crit} = +- %.3f$" % (t_val, t_crit))
ltx("Reject $H_0$.")


$t = 9.553 > t_{crit} = +- 2.228$
Reject $H_0$.

In [ ]: